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NETWORK TOMOGRAPHY FOR INTEGER-VALUED TRAFFIC 

By Martin L. Hazelton 
Massey University 

A classic network tomography problem is estimation of proper¬ 
ties of the distribution of route traffic volumes based on counts taken 
on the network links. We consider inference for a general class of 
models for integer-valued traffic. Model identifiability is examined. 

We investigate both maximum likelihood and Bayesian methods of 
estimation. In practice, these must be implemented using stochastic 
EM and MCMC approaches. This requires a methodology for sam¬ 
pling latent route flows conditional on the observed link counts. We 
show that existing algorithms for doing so can fail entirely, because 
inflexibility in the choice of sampling directions can leave the sampler 
trapped at a vertex of the convex polytope that describes the feasible 
set of route flows. We prove that so long as the network’s link-path 
incidence matrix is totally unimodular, it is always possible to select a 
coordinate system representation of the polytope for which sampling 
parallel to the axes is adequate. This motivates a modified sampler 
in which the representation of the polytope adapts to provide good 
mixing behavior. This methodology is applied to three road traffic 
data sets. We conclude with a discussion of the ramifications of the 
unimodularity requirements for the routing matrix. 

1. Introduction. Network-based transport models occur commonly in 
road traffic engineering and in the analysis of electronic communications 
systems [e.g., Castro et al. (2004), Denby et al. (2007)]. Such models can 
also be found in biological settings, for example, in the study of fungal net¬ 
works [e.g., Heaton et al. (2012)]. The implementation of network models 
gives rise to a variety of interesting and challenging statistical problems. In 
particular, it is frequently the case that observed data provide only indi¬ 
rect information on many model parameters of interest. We are then faced 
with network tomography, a term coined by Vardi (1996) to characterize the 
resulting types of inverse problems. This area has received significant atten¬ 
tion over the past 20 years, with important contributions by Vardi (1996), 
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Tebaldi and West (1998), Cao et al. (2000), Liang and Yu (2003), Lawrence, 
Michailidis and Nair (2006) and Airoldi and Blocker (2013), among others. 
Castro et al. (2004) and Kolaczyk (2009) provide overviews. 

A classical and heavily studied example of network tomography is the 
problem of estimating (mean) origin-destination (O-D) volumes from traffic 
counts at fixed network locations, sometimes referred to as volume network 
tomography. If there are multiple routes connecting some or all of the O-D 
pairs, then a more general version of the problem is to estimate the mean 
route flows. Inference would be straightforward if we observed the actual 
route flows, but the link count data determine them only up to a system of 
linear equations that is usually highly under deter mined. 

Much of the early work on this problem appeared in the transportation 
science literature, focusing on road traffic networks. Underdetermination of 
the route flows was addressed by either assuming the existence of service¬ 
able prior information [e.g., Maher (1983), Cascetta (1984), Bell (1991)] or 
by making very strong (and in some cases seemingly arbitrary) modeling 
assumptions [Zuylen and Willumsen (1980)]. One of the things that char¬ 
acterized much of this work was a focus on algorithms, with the underlying 
models often poorly specified. In particular, the distinction between the re¬ 
alized route flows and the mean value thereof was typically blurred. 

Vardi (1996) introduced a more statistically principled approach to the 
estimation of mean O-D flows. Working under the assumption of Poisson dis¬ 
tributed traffic, he demonstrated identifiability of these parameters from se¬ 
quences of traffic count data and expounded on the difficulties of likelihood- 
based inference. In particular, he noted that the Poisson likelihood requires 
a sum over all feasible route flow vectors: that is, route flows that are con¬ 
sistent with the traffic counts and solve the aforementioned linear system. 
Vardi (1996) showed that this set will be far too large to enumerate in any¬ 
thing other than toy problems. We note that the inconvenient form of the 
likelihood is not restricted to Poisson models. The likelihood function for 
general integer-valued traffic models can only be expressed as a sum over 
the (typically huge) set of feasible route flows. See Section 3.1 for details. 

The inferential problem can become far simpler if one is willing to em¬ 
ploy a continuous approximation to discrete traffic flows. In that case the 
likelihood can be expressed as an integral over all feasible route flows. For 
normally distributed flows this integral can typically be evaluated analyti¬ 
cally, a result that Vardi (1996) used to develop a method-of-moments type 
estimator for normal approximations to Poisson traffic models. Normal mod¬ 
els have formed the basis of the majority of work on O-D matrix estimation 
(and similar “passive” network tomography problems) when applied to elec¬ 
tronic communication networks. See, for example, Cao et al. (2000), Castro 
et al. (2004). In more complicated continuous flow models the likelihood 
and/or Bayesian posterior may not be available explicitly. However, even 
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then, working with continuous flows opens up MCMC sampling approaches 
that are not available in the discrete case. See, for example, Airoldi and 
Blocker (2013) on inference for their multilevel state-space models of time 
series of (large) traffic flows in communications networks. 

For road traffic examples (which provide the author’s motivating inter¬ 
est), continuous flow approximations are less attractive. In particular, even 
in large and busy road networks there will typically be large numbers of plau¬ 
sible routes with very small (often zero) traffic counts. With this in mind, 
we focus on models for integer-valued traffic flows. A natural approach to 
circumvent the impracticality of enumerating the feasible route flow set for 
such models is to sample therefrom. Tebaldi and West (1998) were the first 
authors to describe a comprehensive methodology of this type when they 
studied Bayesian inference for Poisson models using an MCMC algorithm 
that involved conditional sampling of the latent route flows. The crucial step 
in this work was the development of a componentwise method for drawing 
feasible candidate route flows. While this algorithm can work adequately in 
benign examples, Hazelton (2010) and Airoldi and Haas (2011) showed that 
it can mix very slowly in more difficult problems, and even fail entirely in 
some cases. Moreover, the poor practical performance that is sometimes ob¬ 
served highlighted critical gaps in our theoretical understanding of the route 
flow sampling problem in general and errors in the mathematical analysis 
of the properties of the proposed sampler in particular. Despite the pivotal 
importance of developing a reliable route flow sampler for implementing 
likelihood-based methods of inference, subsequent progress on this sampling 
problem in the discrete flow case has been limited. In particular, the only 
tangible advance has been restricted to networks with particularly simple 
topologies, like transit networks and trees. See Hazelton (2010). 

In this paper we study network tomography for integer-valued traffic flows 
for a rather general class of models. We examine model identifiability [gen¬ 
eralizing the results of Vardi (1996)] and consider both maximum likelihood 
and Bayesian inference implemented through sampling-based methods. We 
build on recent work by Airoldi and Haas (2011) and Airoldi and Blocker 
(2013) in the continuous flow case to provide geometrical insight in the 
route flow sampling problem. These methods help to provide a better under¬ 
standing of both the practical and theoretical aspects of Tebaldi and West’s 
(1998) sampler, and motivate a modified sampler with much improved prop¬ 
erties. 

Our work is largely focused on inference for static network parameters, 
based on either a single observed set of link counts or a sequence of such that 
can be regarded as a random sample for modeling purposes. Such problems 
are of significant interest in the context of road network planning, where, 
for example, static O-D matrices are frequently employed when examining 
the effects of proposed network changes. Nonetheless, the route flow sampler 
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that we develop could equally well be employed within algorithms for fitting 
time-varying models, such as those developed by Airoldi and Blocker (2013) 
for computer networks. 

The remainder of the paper is organized as follows. We introduce our gen¬ 
eral class of traffic models in the next section and examine the foundations 
of inference for them in Section 3. In Section 4 we study the properties of 
Tebaldi and West’s (1998) route flow sampler and introduce our modified 
version thereof. We illustrate the application of our methodology on traffic 
data from sections of the road network in Leicester in Section 5. We draw 
conclusions in Section 6, and discuss the practical and theoretical conse¬ 
quences of making convenient assumptions about the pattern of permissible 
routes through the network 

2. Traffic models. We consider a (weakly) connected network with m 
nodes and no links (numbered sequentially in both cases). Each traveler on 
the network makes a journey between an origin and destination node. Not 
all node pairs need be O-D pairs. We therefore introduce X to be the set 
of O-D node pairs, with cardinality c < m(m — l)/2. The elements of X are 
ordered lexicographically and so may be referenced by a one-dimensional 
index. 

Each O-D pair is connected by at least one route. In a big network with 
reasonable connectivity there will typically be a large number of possible 
routes. However, many of these may be implausible in practice, for example, 
because they contain cycles or are very circuitous. For the sake of parsimony 
we choose to ignore such routes in our model, and hence assume that a set 
of permissible routes has been determined a priori. We denote by IZk the 
set of (permissible) routes for O-D pair fcel, and let 1Z = [j k IZk be the 
set of all such routes with cardinality r = \1Z\. We assume some convenient 
ordering of the routes to allow one-dimensional indexing. 

Let x = (aq,..., x r ) T denote the vector of (integer-valued) traffic flows 
on these routes during some measurement period. These are not observed 
directly. Instead our data comprise traffic counts on n < no monitored links 
of the network. We denote these link counts by y = (yi ,..., y n ) T . They are 
related to the latent route flows by 

(1) y = Hx, 

where A = (a^-) is the link-path incidence matrix defined by 

f 1, if link i forms part of route j, 

0/^ j — S 

[ 0, otherwise. 

Typically n<r so that (1) is a highly underdetermined linear system. We 
denote the set of nonnegative solutions of (1) (i.e., the set of feasible route 
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flows given link counts y) by X\ y = {x: y = Ax, x > 0}, where the inequality 
is to be interpreted elementwise. 

For many problems it is natural to develop a statistical model of the traffic 
system for which the parameters relate directly to the route flows. We focus 
on such models and denote by fx('\Q) the joint probability mass function 
for x. We assume that the support of fx is X = Z> 0 . 

Example 1. For estimation of mean O-D traffic flows 6 = (#i,. .. , 0 C ) 
in networks with fixed routing (so that r = c and route flows are identical 
to O-D flows), Vardi (1996) and Tebaldi and West (1998) both assumed 
that an,... ,a; r are independent with Xj ~ Pois(0j). Traffic flows on real road 
systems are often overdispersed in comparison to a Poisson model [e.g., 
Hazelton (2001)], so use of a negative binomial model is a plausible alterna¬ 
tive. 

For a network with multiple routes per O-D pair, let z = (z±,..., z c ) T be 
the vector of O-D flows so that Zk = Yhjex k x r F° r estimation of p = E[z] 
we could proceed by aggregating results from a model where each route flow 
is separately parameterized. For example, if we employ the Poisson model 
Xj ~ Pois(0j) for independent route flows xi,...,x r , then /q, ; = Yhj&i k Qj- 
We also obtain route choice probabilities as a by-product. Specifically, the 
probability that a traveler for O-D pair k selects route j £ is simply 
pj = Oj/Pk(j)i where we use the notation k(j) to emphasize that route j 
connects O-D pair k. 

An objection to this modeling approach is that it greatly exacerbates the 
underdetermination of the fundamental linear system (1), rendering statis¬ 
tical inference all the more difficult. An alternative is to employ a model 
for the route choice probabilities p = (pi,... ,p r ) T that is either partially or 
completely specified exogenously. The Markov routing model examined by 
Vardi (1996) and Tebaldi and West (1998) does this in essence by defining 
the route choice probabilities as the product of “turning probabilities” at 
each node encountered en route. However, the suitability of the underlying 
Markov assumption for real road systems is highly questionable, not least 
because it permits routes with (multiple) cycles. If we have travel costs for 
the possible routes, then an alternative is to employ random utility models 
[e.g., Ben-Akiva and Lerrnan (1985)]. Examples from the transport research 
literature include various forms of logit route choice model [e.g., Daganzo 
and Sheffi (1977), Cascetta et al. (1996), Koppelman and Wen (2000)] and 
probit methods [e.g., Yai, Iwakura and Morichi (1997)]. 

Even if a lightly parameterized route choice model is used, the number 
of O-D pairs c will typically exceed the number of monitored links n by a 
large margin. It follows that if we observed just a single link count vector 
y, then we will require additional information in order to obtain unique 
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point estimates. In principle, more can be learned from link count data 
{yd); t = 1,2,..., IV} collected over a sequence of observation periods. The 
subsequent analysis is most straightforward if y) 1 ),... , y) w ) are assumed to 
be independent and identically distributed. An alternative is to model the 
inter-period dynamics of the traffic flow as a Markov process [e.g., Cascetta 
(1989)]. However, for most commonly used traffic models of this type, the 
nature of the inferential problems remains essentially the same [see Parry 
and Hazelton (2013)]. 

3. Tools for statistical inference. 

3.1. Model likelihood and identifiability. Consider modeling a single link 
count vector y. We can derive the likelihood function, L, by conditioning on 
the latent trip vector: 

L(0) = fy( y\0) 

(2) 

= X]/ri*( y l x ’ 6, )'M x l 0 ) = Y /*( x I 0 )- 

x xG -*|y 

The third equality follows from the fact that f Y |x(y|x, 0) is the indicator 
function for the constraint y = Ax,x > 0. Notice that this is independent of 
6. For general integer-valued traffic models it is not possible to simplify (2). 
This means that exact computation of the likelihood requires enumeration 
of the feasible route flow set ft] y = {x: y = Ax, x > 0}. For even moderately 
sized networks this will typically be computationally infeasible. 

Example 2. Consider a sequence of 52 consecutively numbered nodes 
connected in series, with the first two being the origins and the last 50 be¬ 
ing the destinations of travel. Suppose that 25 vehicles originate at each 
of the first two nodes, and that each of the remaining 50 nodes is the 
destination for a single vehicle, so that the link count vector is given by 
y = (25,50,49,48,..., 2,1). Despite the simplicity of the network and the 
presence of fixed routing, there are nonetheless (^) > 10 14 elements in the 
set T| y . 

The likelihood from a random sample of N traffic count vectors is 

N 

( 3 ) m = n y 

t=1 x (i) e\ (i) 

Again, we cannot usually expect any simplification of his function. This has 
ramifications for the existence of nontrivial sufficient statistics. For example, 
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suppose that we employ the Poisson model from Example 1. If we were to 
observe the route flows directly, then x = N —1 would be a sufficient 

statistic. However, with just link count data, the only sufficient statistic is 
the set of raw observations {y^;t = 1,2,... ,IV}. In particular, the mean 
link count vector y is certainly not sufficient for G. 

It is intuitively obvious that the full sequence of link counts contains more 
information than the mean vector. In particular, the pattern of dependence 
between link counts is illuminating in untangling the indeterminacy prob¬ 
lem. As an illustration based on Example 2, note that independence of y\ 
(the number of travelers leaving node 1) with all of U 27 , ■ ■ ■ ,y 5 i (the num¬ 
bers of travelers arriving at nodes 28 to 52, resp.) occurs if and only if all 
travelers originating at node 1 are destined for nodes numbered in the range 
3 to 27. What is less immediately apparent is whether the information con¬ 
tained in the link counts is generally sufficient to make the model parameters 
identifiable. 

Vardi (1996) addressed this issue in a particular case. Specifically, he 
examined the identifiability of G from link count data in networks with 
fixed routing when the route flows are independent with Xj ~ Pois(@j) for 
j = 1,... , r. He showed that if the columns of A are distinct and each has 
at least one nonzero entry, then G is identifiable. Subsequent work on iden¬ 
tifiability has focused primarily on models for computer networks, where 
the large traffic counts justify the use of continuous flow distributions with 
support that is not explicitly bounded at zero. The most comprehensive con¬ 
tribution is due to Singhal and Michailidis (2007). They derived results that 
can be applied to models with certain types of spatio-temporal dependence 
between route flows, but placed restrictions on the traffic routing schemes 
and network structure. 

We seek to extend Vardi’s (1996) result to general routing schemes and 
discrete traffic distributions while maintaining the generality of permissible 
network structures (in terms of topology and placement of traffic counters). 
As the following theorem shows, the assumptions of Poisson route flows and 
fixed routing can be relaxed while maintaining identifiability of the model 
parameters. 

Proposition 1. Let the columns of A be distinct, with each containing 
at least one nonzero element. Assume that the route flows are independent 
and that the marginal distribution of each has support equal to the non¬ 
negative integers. Then if the model parameter vector is identifiable from 
independent observations on x, it is also identifiable from independent ob¬ 
servations on y. 


The proof is given in the Appendix. 
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In principle, this is a reassuring result, indicating that if we observe an in¬ 
dependent sequence of link counts over our network, then we can eventually 
hope to obtain unique parameter estimates despite the underlying structural 
ambiguities. However, the nature of our constructive proof suggests that it 
could require an excessively long sequence of observations on y to do so. 
In practice, we may have limited ability to untangle the elements of 0 from 
even quite lengthy sequences of link counts. 

3.2. Sampling-based inference. Exact inference based on the model like¬ 
lihood is typically not possible because enumeration of the set X\ y is compu¬ 
tationally infeasible. A natural alternative is to use an approximation to the 
likelihood where the summation over all elements of X\ y in (2) is replaced by 
a sum over some suitable sample therefrom. The essence of this idea can be 
implemented using the stochastic EM algorithm for purely likelihood-based 
inference, and via MCMC methods in a Bayesian setting. 

We consider first the EM algorithm when a random sample y^,... , y^ 
of link counts is available. Regarding the route flows as missing data, the 
complete data likelihood is 

N N 

= n/x(x (f) |0) 

t=i t= i 

since y 1 W is a deterministic function of x®. The expectation of the complete 
data log-likelihood computed with respect to the distribution of the route 
flows conditional on the link counts is given by 

N 

(4) Q(O\0') = Y^ 1 o s{/a(x ( ' |0)}/x|y(x (t) |y (t) ,0 / ) 

t=i x(*)G* |y(t) 

for any given O' . The algorithm proceeds by iterating between finding the 
maximizer 6 of Q and computation of Q(0\0 r ) with O' reset to equal 0. It 
converges to the maximum likelihood estimate 0. 

Evaluation of the conditional expectation Q(0\0') is difficult because of 
the summation over feasible route flow sets in (4), although for Poisson 
models the problem can be simplified in certain special cases as noted by 
Vanderbei and Iannone (1994) and Li (2005). As an alternative, one could 
consider using a normal approximation if the flows are not too small [e.g., 
Vardi (1996), Li (2005)]. A more generally applicable approach is to approx¬ 
imate Q(0\0') by computing the mean of each term log{/x(x^ |0)} over M 
simulations drawn from /x|y( x ^ |y , O'). Accordingly, the stochastic EM 
algorithm works by replacing Q(0\0') by 

M N 

Q(o\o') = m~ 1 Y j Y, lo S {Mx* W \0)}, 

i =1 t= 1 


( 5 ) 
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where is a random sample from fx\Y( m \y^\ #0- Importantly, 

when implementing the stochastic EM algorithm in practice, the number of 
simulations M should adapt as the algorithm progresses in order to ensure 
convergence. See Caffo, Jank and Jones (2005). This is illustrated in the 
application studied in Section 5.1. 

Standard errors for 6 can be obtained via the missing information prin¬ 
ciple in the usual way [e.g., Louis (1982), Tanner (1996)]. The observed 
information matrix is estimated by 

i 0 bs = 4bs(^;y (1) ,---,y (iV) ) 


( 6 ) 


1 

M 


M 




;x 


*(i) 


*(N) 

V 


) 


-u(0;x) 


(i) 


.x. 


(NY 


u(0;x] 


(i) 


.x. 


(N) 


) T ), 


where u(0;x 


■( 1 ) 


,x 


*(AJ, _ 


) = YltLi 51og{/x(x*^ \6)}/d6 is the complete 


data score vector and /(0;x*^\... ,x*^) = — <9u(0;x*^, ... ,x*^)/<90 T 
the complete data information matrix. The (approximate) variance-covariance 
matrix of 6 is given by I~^ s . 

Turning to Bayesian inference, suppose that we have available a prior 
n(0) for the model parameters. Exact computation of the posterior p(6 |y) oc 
n(0)L(d) will generally be infeasible because of the difficulties in computing 
the likelihood. Computation of the normalizing constant for p(6 |y) is an 
additional problem. We therefore resort to MCMC methods to generate 
posterior samples. 

Following the lead of Tebaldi and West (1998), we avoid the need to enu¬ 
merate all feasible route flows by sampling from the joint posterior of 6 and 
x^ 1 ),... , x^. Working in the case of origin-destination matrix estimation 
with N = 1, Tebaldi and West (1998) proposed a Gibbs sampler, iterating be¬ 
tween draws of 6 and x from their respective conditional distributions. The 
former conditional simplifies: p{6 |x, y) =p{6 |x) because y is determined by 
x. It follows that conditional sampling of 6 will typically be straightforward. 
For instance, for the Poisson traffic model in Example 1 we get a gamma 
conditional for 6 when using conjugate gamma priors for the components of 
6. We may employ Metropolis-Hastings sampling for nonconjugate models. 
At the second stage of each iteration, conditional sampling of x requires 
draws from /x|y('|y^)- 

Applying the same methodology to alternatively parameterized traffic 
models and to sample sizes N > 1 presents no further problems in principle, 
although there is flexibility as to the order in which variables are updated. 
For example, one can choose to intersperse updates of 6 between draws of 
each of xd) 5 ... , x^. Indeed, there is considerable flexibility in the overall 
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design of the sampler that may lead to improved convergence properties. For 
instance, in the context of origin-destination matrix estimation, Airoldi and 
Blocker (2013) recommend modifying Tebaldi and West’s (1998) scheme to 
use a Metropolis-Hastings algorithm where candidate pairs (0j,Xj) are sam¬ 
pled. 

4. Conditional sampling of route flows. 

4.1. Overview of the problem. The methods of sampling-based inference 
discussed in the previous section all share a common problem: the need to 
sample efficiently the latent route flow vector x given observed link counts y. 
The development of an effective method of doing so has proved challenging. 
The only published methodology for integer-valued flows on general networks 
is due to Tebaldi and West (1998). More recently, Airoldi and Haas (2011) 
and Airoldi and Blocker (2013) made progress on the continuous flow version 
of the problem. In this section we review these methods and give examples 
where Tebaldi and West’s (1998) sampler fails. By examining the geometry 
of the feasible set X\ y we are able to explain this behavior and also prove 
results characterizing the conditions under which convergence is guaranteed. 
This in turn leads us to propose a modified version of Tebaldi and West’s 
(1998) methodology with far better properties. 

We frame the problem in terms of a Metropolis Hastings sampler for the 
conditional distribution /x|y('|y>0)j the current state of which is x £ X\ y . 
The general approach is to generate a candidate vector xf from a proposal 
distribution q with support X\ y which is then accepted with probability 
min(a, 1), where 

_ /.Y|y( xt |y>0)g(x) 

/x|y( x |y,0)g(xt) 

m = /x,y(x t ,y|^)g(x) 

/x,y(x,y|0)?(xt) 

= /x(x f |0)g(x) 

/x(x|0)g(xt)' 

Note that the final equality in (7) holds only if x^ is feasible, which will 
always be the case if q has support Xj y . It is possible to employ q with 
support that extends beyond X\ y on the understanding that any infeasible 
vectors xf are automatically rejected. Nonetheless, this will be a practical 
option only if the effective support of q is a reasonable approximation to 
Xi y , otherwise the acceptance rate is liable to be unacceptably low. 

Such a sampler can be initialized at a feasible route flow vector in a 
number of ways. For example, we may use standard integer programming 
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methods to obtain the optimal element of X\ y against some prespecified 
criterion. 


4.2. Tebaldi and West’s sampler. Assume that the rows of A are linearly 
independent. (If this is not the case, then it indicates that one or more of the 
link counts is redundant and can be omitted from the analysis without any 
loss of information.) Then the observed link counts place n linear constraints 
on the route flow vector x. As Tebaldi and West (1998) note, if we reorder 
the routes (and hence columns of A) in a suitable manner, then we can write 
(1) as 


( 8 ) 


y = [A 1 \A 2 ] 


xi 

x 2 


Aixi + A 2 x 2 , 


where A\ is an n x n invertible matrix. It follows that 


(9) 


xi = A 1 1 (y-A 2 x 2 ), 


indicating that we need sample only elements of the (r — n)-dimensional 
vector x 2 . 

The conditional distribution of x 2 given y has support 

X 2 \ y = {x 2 : y = Ax, x > 0} = {x 2 : A^ 1 (y - A 2 x 2 ) > 0, x 2 > 0}, 

where the vector inequalities are to be interpreted elementwise. Attempting 
to sample x 2 en bloc would require a convenient characterization of X 2 \ y . 
which we do not have. Tebaldi and West (1998) therefore suggested compo¬ 
nentwise sampling of x 2 . In describing this technique we employ the route 
ordering implied in (8) so that xi = (x'i,... ,x n ) T and x n+ j is the jth ele¬ 
ment of x 2 for j = 1,..., (r — n). We let x 2j _j denote the vector x 2 with its 
)th element omitted. 

Let us then consider updating x 2 by sampling x' n+ - conditional on x 2j _j 
and y. As Tebaldi and West (1998) showed, the conditional support of x' n+ - 
is a finite sequence of contiguous integers. They did not compute the end¬ 
points of this sequence explicitly, relying instead on an acceptance-rejection 
methodology to generate a feasible candidate. However, computation of the 
endpoints is straightforward, as we discuss later. We may therefore sample 
x' n+ j from a distribution q 2 j with appropriate support. Defining xt to be 
x 2 with just the jth. component updated in this manner, we then compute 
xj according to (9) to give the full candidate vector of route flows x^. This 
is accepted with probability min(aj, 1), where 

fx\Y(x*\y,0)Qj(x n +j) fx(^\0)qj(x n+ j) 
fx | y ( x I y, e ) Qj (® l +j ) fx (x 1 0 ) qj ( x f n+j ) 


( 10 ) 
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Fig. 1. An example four link network. 


For models with a priori independent route flows, the contributions to fx 
on the top and bottom will cancel for all routes except 1 , 2 ,..., n, n + j. 

Sequentially sampling the elements {x n +j,j = 1,... ,n — r} in this man¬ 
ner will guarantee eventual sampling from fx\y{'\y^) so long as ^ 2 |y is 
connected, in the sense that it is possible to move between any two ele¬ 
ments of this space by a sequence of moves parallel to the coordinate axes. 
Unfortunately this is not always the case. 


Example 3. Figure 1 depicts a shortened version of the series network 
examined in Example 2. We assume that nodes 1 and 2 are the only origins 
of flow, and nodes 3, 4, 5 are the only travel destinations. If we order the 
routes lexicographically by origin then destination, the (partitioned) link- 
route incidence matrix is given by 

0 O' 

1 1 
1 1 ' 

0 1 


( 11 ) 


A = [v4i | v4 2 ] = 


1110 
1111 
0 110 
0 0 10 


Note that the matrix A\ in this partition is invertible, as required for appli¬ 
cation of Tebaldi and West’s (1998) sampler. 

Suppose that we observe link counts y = (10,20,20,10) T . Given that 
node 3 is not a source of travel, we can immediately infer that the route 
flows destined for this node are both zero, that is, x\ = X 4 = 0. It is then 
straightforward to show that the feasible route set is defined by X 2 \ y = 
{x 5 + x& = 10}, as displayed in the left-hand panel of Figure 2. In this sit¬ 
uation, Tebaldi and West’s (1998) sampler will fail entirely, since there are 
no feasible steps in the coordinate directions of X 5 and xq. 

Such an extreme situation could be avoided by pre-checking whether any 
route flows are uniquely determined by the observed link counts. How¬ 
ever, even if we discount such cases, it is simple to construct examples 
where the sampler mixes extremely slowly. For instance, suppose that y = 
(10,20,19,9) T , implying that a single traveler is destined for node 3. The 
corresponding set of feasible route flows is defined by X 2 \ y as displayed in 
the right-hand panel of Figure 2. Generation of candidates in the coordinate 
directions of X 5 and xq will allow route flows to change by no more than one 
unit, so that exploration of the entirety of Xj y will be a laborious business. 
Moreover, we can design cases where mixing becomes arbitrarily slow by 
increasing flows for all routes except those destined for node 3. For instance, 
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Fig. 2. Feasible route sets in terms of flows for routes 5 (from node 2 to node 4) and 6 
(from node 2 to node 5) for the network in Figure 1. The left-hand panel corresponds to 
the case y = (10,20,20,10) T ; the right-hand panel to y = (10,20,19,9) T . The interior of 
the convex hull of each feasible set is shaded as an embellishment to help emphasize shape. 

if y = (1000,2000,1999, 999) t , then the feasible set will appear as a much 
thinner version of the right-hand panel of Figure 2. 

Working in the context of continuous traffic flow modeling, Airoldi and 
Blocker (2013) addressed these problems by selecting a random search direc¬ 
tion in ff 2 | y . A candidate x is then sampled from along the correspondingly 
oriented line segment. However, adapting this approach to integer-valued 
flows is not straightforward, since appropriate discretization of the sampling 
probabilities requires computationally expensive information on the local 
geometry of A| y . Furthermore, this random directions algorithm will mix 
poorly in cases like Example 3 because “long moves” will be achieved only 
infrequently when a fortuitous orientation is proposed. 

4.3. A modified route flow sampler. Intuitively, we can think about the 
route flows corresponding to the columns of A\ as providing a “swap space” 
in Tebaldi and West’s (1998) algorithm, in the sense that if we wish to trans¬ 
fer travelers between routes, then this can only occur by swapping the trav¬ 
elers in and out of this space. When seeking to update x n+ j (for j > 1), the 
value of this flow can only increase or decrease to the extent that we can 
obtain travelers from, or donate travelers to, the route flows in the swap 
space. Looking back at Example 3, the problem is that x\ and X 4 can take 
only a very small range of values. It is the resultant lack of slack in the swap 
space that prevents the sampler from taking large steps and mixing well. 

This discussion motivates a simple modification to Tebaldi and West’s 
(1998) algorithm. The decomposition of A will typically be far from unique, 
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even given that A\ must be invertible. We should then seek to reorder the 
routes, and hence adjust the partition of A, so as to increase the slack 
available in the swap space. We can demonstrate formally that this will work, 
in the sense that it is always possible, and moreover practicable, to find a 
partition of A with sufficient slack to allow the sampler to mix adequately. 
Our developments make use of some results in integer geometry, building on 
the work of Airoldi and Haas (2011). 

In what follows, we make the assumption that the matrix A is totally 
unimodular. That is, each invertible square submatrix (and hence any A\ 
that we consider) is integer valued. The requirement that Aj" 1 be an integer 
matrix is implicit in the work of Tebaldi and West (1998), since otherwise 
there is no certainty that all the sampled route flows will be integers. This 
assumption is explicitly stated in the theoretical work of Airoldi and Haas 
(2011) and Airoldi and Blocker (2013). As the last mentioned authors note, 
total unimodularity appears to be a very common property of link-route 
incidence matrices; it holds for all examples that they found in the literature. 
Nonetheless, it is not assured, a matter that we discuss in more detail in 
Section 6. 

Consider any partition A = [A 1 IA 2 ] for which A\ is invertible, and define 


( 12 ) 


U = 


—A 1 1 Ai 

Ir—n 


where I r - n is the (r — n)-dimensional identity matrix. As Airoldi and Haas 
(2011) note, AU = 0, so that the columns of U generate the null space of A. 
Moreover, because A is totally unimodular, U is integer valued. Now, the 
jth column U, of U (for j = 1,... , r — n) is zero in all components below 
the nth, apart from u n +jj = 1. It follows that if x € X\ y , then x =p u j is 
potentially the vector of flows resulting from transferring a single traveler 
to or from the swap space to route n + j. We say potentially because there 
is no certainty that x u, will be feasible. 

Tebaldi and West’s (1998) algorithm works by iteratively sampling in the 
directions ui,..., u r _ n . For a given partition of A this will work so long 
as movement from x G X\ y is possible in at least one of those directions. If 
movement is possible parallel to Uj, then, as noted before, the (conditional) 
support of a candidate x' n+ j is a contiguous integer sequence {xio ; Xio + 
1 ,..., Xhi}j where the endpoints are given by 


and 


Xio = max(— max{— x* : Uj = 1}, 0) 


Xhi = max(min{x* : Uj = -1}, 0), 

in which (x|,..., x* ) T = x* = Aj( 1 (y — A2-jX.2,-j) with A 2 -j being the ma¬ 
trix A 2 with the jth column deleted. 
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Nevertheless, as we saw in Example 3, there is no guarantee that move¬ 
ment is possible in any of the aforementioned directions. We note that this 
contradicts the irreducibility result given in the Appendix of Tebaldi and 
West (1998), but the proof therein is flawed since it relies on the fallacious 
premise that if all the elements of a sum of vectors are nonnegative, then 
at least one of the summands must have no negative elements. What we 
will show is that when the algorithm is stuck at x for a given ordering of 
the columns of A, there is always an alternative partition [A 1 JA 2 ] such that 
movement is possible parallel to a column of the adjusted U. 

Let us temporarily relax the requirement that the route flows be inte¬ 
ger valued. Then geometrically, the feasible set X y for real-valued flows is 
formed by the intersection of the linear manifold {x: x = Ay} with the non¬ 
negative orthant {x : x > 0}. The resulting set forms a convex polytope [see 
Ziegler (1995), e.g.]. This is an (r — n)-dimensional object embedded within r 
dimensional space. It can be characterized by the convex hull of its vertices, 
where x G Xj y is a vertex of the polytope if it has r — n zero coordinates. 
Furthermore, we have the following: 

Lemma 1 [Airoldi and Haas (2011)]. The vertices of Xj y are integer 
valued even when the route flows are continuous. 

We note in passing that while this result is stated and proved in Airoldi 
and Haas (2011), it has been known for some time. Equivalent results can 
be found in Hoffman and Kruskal (1956) and Veinott and Dantzig (1968). 

Suppose x is a vertex. We can reorder the columns of A so that the last 
r — n entries of x are zero. (The matrix A\ under this reordering must be 
invertible, otherwise x would be infeasible and hence not a vertex.) There¬ 
fore, x + u j (for any j = 1,... ,r — n) has r — n — 1 zero elements, is integer 
valued and, if feasible, lies on an edge of the polytope. Now, for any general 
point x G Xj y that is not a vertex, convexity of the polytope ensures that 
movement must be possible parallel to some edge. In order to prove that 
the sampler will mix, it remains to show that the sampler cannot get stuck 
at a vertex. There are two possibilities. If movement is not possible along 
any edge leading from x, then this vertex must be the sole element of Xj y , 
in which case the route flows are uniquely determined by the link counts. 
If movement is possible parallel to the j th column of U, then x + u j G Xjy. 
That we can take an integer-valued step in that direction is assured by 
Lemma 1. 

We have proved the following: 

Proposition 2. Given any feasible integer-valued flow vector either 
(i) x is the sole element of X | y ; or 
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(ii) there exists a matrix partition A = [A \, A 2 ] and corresponding matrix 
U from (12) such that x + u j £ X\ y for some 1 <j<i — n. 

This result ensures that the sampler will always have a feasible integer¬ 
valued move in at least one coordinate direction, but provides no guidance as 
to whether movement is possible in any given direction. The next proposition 
provides a sufficient condition that tallies with our earlier intuition about 
the need for slack in the swap space. Specifically, if there is flow on all the 
routes corresponding to the first n columns of A, then the sample has feasible 
moves parallel to all the coordinate axes defined by U. 


Proposition 3. Let X € X\ y with Xi > 0 for i = 1,..., n. Then for each 
j = 1,..., r — n, x + Uj is a feasible route flow vector. 


The proof is given in the Appendix. 

Altering the partition of A corresponds to a change in the (r — n)- 
dimensional coordinate system representation of Xj y . In particular, we can 
choose a representation in which one of the axes is parallel to any given 
edge. This immediately explains how we can hope to avoid the difficulties 
encountered in Example 3. The problem in Figure 2 is the orientation of 
the polytope, rather than the fact that it is long and thin. Let us switch 
columns 4 and 5 of A to give 


(13) 


A = [Ai|A 2 ] = 


'1 

1 

1 

0 

0 

o' 

1 

1 

1 

1 

1 

1 

0 

1 

1 

1 

0 

1 

1 

0 

0 

1 

0 

0 

1 


The polytope Xj y is now represented in terms of the route flows from node 
2 to node 3 (column 5) and from node 2 to node 5 (column 6). The resulting 
feasible regions in this coordinate system are displayed in Figure 3 for the 
cases y = (10,20,20,10) T and y = (10,20,19,9) T . Clearly, sampling parallel 
to the coordinate axes will be efficient. 

The preceding theory implies that convergence of the sampler can be 
guaranteed if we update the partition of A from iteration to iteration in a 
suitable manner. In particular, a sufficient condition for irreducibility will be 
that every possible partition A = [Ai|A 2 ] (with A\ invertible) is employed 
infinitely often in the long run. This could be achieved by systematically 
cycling through all possible partitions or by sampling the partition at each 
iteration. Such sampling would need to place nonzero probability on each 
partition, but would work best if there is a bias toward selecting partitions 
in which the first n routes tend to carry high flows. 

A direct implementation of this methodology would be feasible in exam¬ 
ples with modest numbers of routes. However, in large examples the need 
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Fig. 3. Feasible route sets in terms of flows for revised routes 5 (from node 2 to node 3) 
and 6 (from node 2 to node 5) for the network in Figure 1. The left-hand panel corresponds 
to the case y = (10,20,20,10) T ; the right-hand panel to y = (10,20,19,9) T . The interior of 
the convex hull of each feasible set is shaded as an embellishment to help emphasize shape. 

to repeatedly find acceptable partitions of A would be computationally im¬ 
practical. Nonetheless, in such cases Proposition 3 will generally come to 
the rescue, since it implies that we need only find a single good partition 
that can then be used unchanged thereafter. Specifically, if we can find n 
linearly independent columns of A for which the corresponding route flows 
have negligible posterior weight at (and preferably near to) zero, then the 
sampler will work adequately if these columns are selected to form A\. 

With these comments in mind, we recommend a phased approach. We 
start with some initial partition of A and run the sampler for a fixed number 
of iterations. At the end of this first pilot phase we compute pilot estimates 
from the sampled route flows and use these to determine a suitable permu¬ 
tation of the columns of A. In the next pilot phase we run the sampler for 
another fixed number of iterations, refine our estimates of the route flows and 
update the partition of A accordingly. This process can be continued until 
we have discovered a suitable route ordering, although in practice we have 
found that two pilot phases are typically sufficient. Once the pilot phases 
are completed, the sampler can run with no further changes to A. 

During this process, each update of the partition of A should be chosen so 
that the routes corresponding to the columns of A\ carry relatively high flow. 
There are two issues to consider. First, what statistic should we compute 
as a summary of the magnitude of the sampled flows for these routes? An 
obvious answer (and the one that we employ in later examples) is to use 
the mean flows from the pilot samples. Employing an estimate of a very low 
percentile is an alternative that relates directly to the desire to avoid very 
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small flows on these routes. The second issue concerns optimization of the 
partition of A based on the pilot estimates. In principle, we could search 
through all possible partitions of A to find the one for which the sum of the 
pilot estimates is largest, subject to the requirement that A\ is invertible. A 
cheap alternative (used in subsequent numerical work) is to employ a greedy 
sequential algorithm, where the set of columns of A\ is built up one route at 
a time, at each step choosing the highest flow route that is not in the span 
of the columns already selected. 

Using this cheap version of the phased approach results in an algorithm 
with almost exactly the same computational expense per iteration as the 
original methodology of Tebaldi and West (1998). The additional computing 
time required to generate a few additional matrix partitions is negligible. Of 
course, we expect our algorithm to have far better mixing properties in 
many applications, and when this is so, it will be far cheaper in terms of the 
computational cost per effective independent sample. 

A comparison with the computational cost of Airoldi and Blocker’s (2013) 
algorithm is a little more involved. This algorithm shares the same com¬ 
putational complexity as that of Tebaldi and West (1998) and the cheap 
(phased) version of our refinement thereof, in the sense that the problem 
of generating a new candidate route flow is fundamentally 0(r — n) in all 
cases. However, we expect the actual computing time per candidate route 
flow for Airoldi and Blocker’s (2013) algorithm to be more than twice that 
of ours because of the extra calculations required to sample along random 
search directions. Nonetheless, it is important to recognize that a proposed 
update of all components of the route flow vector (which we refer to as a 
single iteration in the numerical studies in the following section) requires 
generation of (r — n) candidates using our algorithm (one for each column 
of A 2 ), while a single candidate from Airoldi and Blocker’s (2013) algorithm 
can update all components of x simultaneously. The issue of which is more 
computationally efficient in practice will depend upon acceptance rates. We 
consider this matter a little further in the numerical example in Section 5.3, 
although it should be kept in mind that our algorithm and that of Airoldi 
and Blocker (2013) are only approximately comparable, in the sense that 
they are designed for discrete and continuous flows, respectively. 

5. Applications. Traffic models of the type that we have studied are used 
in practice to model networks of various scales, ranging from inter-urban 
motorway systems to individual urban road intersections. In this section 
we start by considering two applications of the latter type, which provide 
convenient examples to assess and illustrate our methods. We then go on 
to examine a larger section of road network. The first of the applications 
includes link count data from multiple days, but no prior information, so 
estimates are computed using maximum likelihood estimation through the 
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Fig. 4. An abstraction of the intersection of London Road (running left to right) and 
University Road in the English city of Leicester. 

stochastic EM algorithm. In the other examples we have data only from a 
single day, but informative priors are available, making Bayesian inference 
(via the MCMC algorithm) natural. All the applications are taken from the 
road system in the English city of Leicester. 

We consider three algorithms for route flow sampling at various points 
during this section. These are Tebaldi and West’s (1998) algorithm, our 
modification thereof (with at most two partition updates for A), and 
Airoldi and Blocker’s (2013) algorithm with output rounded to integer val¬ 
ues. Additional numerical results detailing trace plots, effective sample size 
and mean slack for these samplers are available as part of the supplementary 
material for this article [Hazelton (2015)]. 

5.1. Application 1: Maximum likelihood estimation for flows at an inter¬ 
section. The first application concerns the intersection of London Road 
with University Road in Leicester. An abstracted form of the physical net¬ 
work is displayed in Figure 4. All nodes are both origins and destinations of 
travel, except for node 2 which is neither. All links except for 2 are equipped 
with inductive loop counters. We have available traffic flows on all other links 
for the period 16:00-16:15 on five nonconsecutive weekdays in May. The data 
are displayed as line plots in Figure 5. 

We consider the problem of estimating the mean origin-destination flows. 
With only 6 routes [one for each of the O-D pairs = 1,3,4, i f 

j}] and 5 monitored links, the linear system (1) is only slightly under¬ 
determined. Moreover, the ordering of the columns of A is irrelevant, since 
the feasible route flow polytope is one-dimensional. The interest lies in 
whether a sample of size N = 5 is sufficient to allow useful inferences to 
be made about the mean route flows based on the likelihood alone (i.e., in 
the absence of prior information) and to what extent model misspecihcation 
may effect the results. 

We consider two models. The first is the Poisson model introduced in 
Example 1, while the second is a negative binomial model, parameterized so 
that E [xf = 9j and Varfar,] = (1 + a)9j for j = 1,..., 6. For both models we 
compute maximum likelihood estimates using the stochastic EM algorithm. 
The initial simulation size was set to M = 2000 (following a burn-in period 
of 2000 iterations). We then followed the strategy of Caffo, Jank and Jones 
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Fig. 5. Traffic counts for five (nonconsecutive) days for the network in Figure 4- 


(2005) to control increases in simulation size and provide the stopping rule. 
Standard errors were computed using (6). For comparison, we also calcu¬ 
late parameter estimates using Vardi’s (1996) method of moments approach 
applied only to the Poisson model. (This methodology cannot be employed 
for the negative binomial model because of the presence of the additional 
dispersion parameter.) The results are displayed in Table 1. 


Table 1 

Estimates of mean route flows for Application 1. Maximum likelihood estimates (MLEs) 
were calculated using the stochastic EM algorithm. MoM denotes Vardi’s (1996) method 
of moment estimator. For the negative binomial model, the maximum likelihood estimate 
(and standard error) of the dispersion parameter a was 1.92 (0.87) 


Route 

O-D 

Poisson model 

Negative binomial 
model MLE (std err) 

MLE (std err) 

MoM 

1 

1-3 

175.5 (7.0) 

420.2 

176.4 (12.9) 

2 

1-4 

41.5 (4.7) 

37.5 

41.0 (9.7) 

3 

3-1 

183.9 (7.1) 

155.4 

183.5 (13.1) 

4 

3-4 

10.1 (3.9) 

39.8 

10.6 (8.1) 

5 

4-1 

61.9 (5.1) 

49.7 

63.1 (10.4) 

6 

4-3 

14.5 (4.0) 

0.0 

12.8 (8.2) 
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The raw link counts suggest overdispersion with respect to a Poisson 
model, and this is borne out by the estimate of a = 1.92 ( SE = 0.87) for 
the dispersion parameter in the negative binomial model. Nonetheless, the 
maximum likelihood estimates of 6 are very similar for the two models. 

We note that there is a marked difference in the nominal standard errors 
obtained between the estimates from the Poisson and negative binomial 
models. In part this reflects the relative capabilities of the models to account 
for the aforementioned overdispersion. However, we also note that inaccuracy 
in the standard errors can be expected given that they rely on asymptotic 
likelihood theory being applied to data from just N = 5 time points. 

To examine the properties of maximum likelihood estimation in this appli¬ 
cation, we simulated 100 sets of route flows from a negative binomial model 
with parameters matching those estimated from the real data. We then 
computed the corresponding link count data sets and found the maximum 
likelihood estimates for each using the stochastic EM algorithm using both a 
(correctly specified) negative binomial model and an (incorrectly specified) 
Poisson model. From these results we calculated the (approximate) biases 
of the estimates. These were low in all cases. The estimated bias in 9j was 
less than 1% for routes i = 1,3,5, which carry the heaviest traffic, rising to 
a maximum (absolute) value of 3.7% for route i = 4, which carries the light¬ 
est flow. The differences in bias between the negative binomial and Poisson 
models were negligible. 

These results suggest not only that maximum likelihood can provide use¬ 
ful estimates in this application, but also that the estimates are quite robust 
to misspecification between negative binomial and Poisson models. Loosely 
speaking, maximum likelihood estimation based on the Poisson model privi¬ 
leges first order information over higher order information, in part evidenced 
by the fact that A0 po j S = y, where <? po is denotes the vector Poisson maxi¬ 
mum likelihood estimates and y = N — 1 The same comment does 

not apply to Vardi’s (1996) method of moments approach based on a Poisson 
model, which produces highly implausible estimates 0 ni om . For the results of 
the real data analysis reported in Table 1, the method of moments estimated 
vector of mean link counts A0 mom differs from y by more than a factor of 
two in some components. 

5.2. Application 2: Bayesian inference at an intersection. The second 
network that we consider describes an area around the junction of University 
Road and Regent Road. See Figure 6. All nodes are both origins and desti¬ 
nations of travel, except for node 4 which is neither. All links except for 5 are 
equipped with inductive loop vehicle detectors. In this case we have available 
only a single set of traffic counts, y = (72,56,217,120,119,127,178,117,181) T , 
again collected on a weekday in May. We aim to estimate the mean route 
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Fig. 6. An abstraction of the network around the intersection of Regent Road (running 
left to right) and University Road. 


flows 0 via the Poisson model from Example 1. Prior information is essential 
in this case and is available from an outdated traffic survey. We incorporate 
this in the form of pseudo route traffic counts x through independent Gamma 
priors with Oj ~ Gamma (V 2,1/2). 

We initialized the MCMC algorithm from the previous section with the 
column order in A randomized, subject to the requirement the A\ be invert¬ 
ible in the usual partition. This gave 

A = [-Ai|-A 2 ] 

( 14 ) 
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The initial flow pattern x was generated through integer programming as 
the solution to (1) with maximum L\ norm. 

The two pilot phases of the MCMC algorithm ran for 10,000 iterations 
each. At the end of the second phase the partition of A was updated so that 
the final column permutation was a = (17, 7,13,1,20,6,11,10,9,2,14,12,19, 
16,8,18,3,15,4,5) in comparison to the initial ordering. The algorithm then 
ran for a further burn-in period of 10,000, followed by a further 10,000 iter¬ 
ations from which posterior estimates were computed. The computing time 
for this complete simulation was 37 seconds, with the algorithm coded in R 
[R Core Team (2013)] running on a 32-bit Windows desktop computer with 
a dual core 3.6 GHz processor and 4 GB of memory. Trace plots for all iter¬ 
ations for 0j and Xj appear in Figures 7 and 8 for an illustrative selection of 
routes, specifically, those numbered j = 1,2,9,13, based on ordering of the 
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Fig. 7. Trace plots for mean route flows {Oj ,j = 1,2,9,13}. Routes are numbered accord¬ 
ing to the columns of A in (14), with the matrix of plots filled by row. Trace plots for all 
20 routes are available as supplementary material. 


columns in (14). We see that while the algorithm is not entirely stuck during 
the first pilot phase, nonetheless only some route flows are successfully up¬ 
dated. It follows that the unmodified version of Tebaldi and West’s (1998) 
sampler would fail to converge to the correct posterior distribution in this 
application. 

The posterior means with corresponding 95% credible intervals appear 
in Table 2 alongside the prior values. Comparing the prior and posterior 
means, the overall level of traffic is very similar, but there are some marked 
differences in the pattern of flows. 

5.3. Application 3: Inference for a larger network. We now turn to a 
larger application based on a section of the city road network just to the 
southeast of the center of Leicester. See Figure 9. This network has 21 nodes 
and 50 links. A total of 85 O-D pairs with an aggregate of 127 routes were 
considered, based on earlier analyses of this network [e.g., Hazelton (2001)]. 
A single set of traffic counts y is available on 27 of the network links. See 
Table 3 for details. As before, prior information is essential and is available in 
the form of pseudo route traffic counts x based on an outdated survey. These 



Fig. 8. Trace plots for sampled route flows {xj,j = 1,2,9,13}. Routes are numbered 
according to the columns of A in (14), with the matrix of plots filled by row. Trace plots 
for all 20 routes are available as supplementary material. 
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Table 2 

Prior and posterior means, 95% posterior credible intervals, for mean route flows for 
Application 2 using a Poisson model 


Route 

O-D 

Prior mean 

Posterior mean 

95% Cl 

1 

3-6 

65.0 

39.5 

(27.9,52.3) 

2 

1-3 

33.0 

20.2 

(12.1,30.2) 

3 

5-2 

67.0 

59.5 

(42.7,78.7) 

4 

6-1 

38.0 

26.4 

(16.3,38.4) 

5 

3-1 

28.0 

21.9 

(12.4,33.1) 

6 

2-6 

30.0 

38.3 

(25.2,53.1) 

7 

6-2 

37.0 

59.8 

(41.8,78.9) 

8 

1-5 

9.0 

5.5 

(1.7,11.4) 

9 

1-6 

30.0 

20.5 

(11.8,31.3) 

10 

2-5 

37.0 

33.8 

(22.2,47.5) 

11 

2-3 

37.0 

36.7 

(26.2,48.9) 

12 

3-5 

20.0 

10.7 

(5.1,18.2) 

13 

3-2 

20.0 

51.6 

(37.8,67.0) 

14 

1-2 

2.0 

15.7 

(6.3,26.3) 

15 

5-6 

20.0 

28.0 

(16.5,41.5) 

16 

2-1 

2.0 

6.4 

(0.3,15.4) 

17 

6-5 

31.0 

66.7 

(49.0,86.2) 

18 

5-3 

10.0 

4.7 

(1.5,9.5) 

19 

6-3 

15.0 

8.0 

(3.4,14.3) 

20 

5-1 

69.0 

38.9 

(27.7,52.1) 


are used to define independent Gamma priors with 6j ~ Gamma (xj/ 5,1/5). 
We note that this type of 0-D matrix updating problem is very common in 
transport planning and modeling. 

We conducted Bayesian inference for the mean route flows 6 using a 
Poisson route flow model, Xj ~ Pois(0j) independently for j = As 

in the previous application, the two pilot phases of our MCMC sampler 
ran for 10,000 iterations each. The algorithm then ran for a further 20,000 
iterations. The total computing time was slightly less than 10 minutes using 
the same computing environment as described in the previous application. 

In Figure 10 we display the trace plots for Xj for three selected routes. We 
also display the mean flow on the routes corresponding to partition A\ as a 
measure of slack in the linear system (1). The sampler is almost completely 
stuck during the first pilot phase, with only 16 of the 127 routes updated 
(in the sense of a change of value) at any stage in the first 10,000 iterations. 
The situation is much improved in the second pilot phase, with all route 
flows updating. Nonetheless, the mixing of the sampler during this phase is 
relatively poor. A second optimization of the route ordering leads to better 
mixing throughout the final 20,000 iterations of the algorithm. 
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Fig. 9. A section of the road network in Leicester, just to the southeast of the city center. 


As expected, the performance of the sampler at the various phases is 
mirrored by the mean slack in the swap space. To provide further insight 
into this effect, consider breaking down the first 30,000 iterations into equally 
sized blocks of 10,000 iterations. These correspond to the three partitions of 
A that are employed. Over the first block the mean slack is 35.9, over the 
second block the mean slack is 46.4, and over the third block the mean slack 
is 163.2. Using the final block as a benchmark, the sampling efficiencies in 
the first two blocks (measured in terms of mean effective sample size for the 
traffic flows on the three selected routes) are 0% and 13.5%, respectively. 

These results show that implementation of Tebaldi and West’s (1998) al¬ 
gorithm without modification of the route order fails completely to converge 
based on the initial partition of A. Moreover, even using the route ordering 


Table 3 

Available link count data for the network in Figure 9 


Link 

1 

2 

5 

6 

7 

11 

13 

14 

16 

Count 

1279 

740 

1112 

826 

1221 

1147 

1066 

764 

835 

Link 

18 

21 

22 

25 

27 

29 

31 

32 

34 

Count 

462 

137 

193 

746 

685 

466 

538 

499 

453 

Link 

35 

36 

37 

38 

39 

40 

42 

46 

47 

Count 

610 

503 

667 

483 

545 

500 

57 

194 

111 
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Fig. 10. Trace plots for sampled route flows for the network in Figure 9. The top panel 
displays the mean flow for routes corresponding to partition A\, and hence is a measure 
of slack. The remaining three panels display sampled flows for three selected routes. 


during the second pilot phase (which is partially optimized) gives a sampler 
with quite poor mixing properties. 

In order to check that these problems were not a consequence of a patho¬ 
logical data set or initial route ordering, we ran a small simulation study 
in which link counts were generated from a Poisson model fitted using the 
posterior mean route flows and where the columns of A were ordered at 
random (subject to the constraint that A\ is invertible). In 100 replications, 
the unmodified version of Tebaldi and West’s (1998) algorithm failed to mix 
on every occasion, in the sense that there was at least one route flow that 
was not updated. In contrast, use of our algorithm with two updates of the 
partition of A led to acceptable mixing in every replication. This indicates 
that while there are a very large number of possible partitions for A, it is 
necessary to search carefully for ones that produce a sampler with good 
properties. 

We also tried applying the random search direction sampler of Airoldi 
and Blocker (2013) to this application, based on code harvested from the 
R library networkTomography [Blocker, Koullick and Airoldi (2012)]. This 
methodology is intended for continuous flows, and so we employed an ap¬ 
proximation in which the likelihoods in the acceptance probability were com¬ 
puted using rounded versions of the sampled route flows. We implemented 
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Route ordering A Route ordering B 




Fig. 11 . Trace plots for sampled route flows for the network in Figure 9 using the random 
search direction algorithm of Airoldi and Blocker (2013). The left-hand panels show results 
obtained using route ordering A; those on the right-hand side show results obtained using 
the route ordering B. 


this sampler using two route orderings. The first (“ordering A”) matches that 
used for the first pilot phase in our algorithm, while the second (“ordering 
B”) is the final (optimized) route ordering found by our algorithm. We thin 
the output, retaining only every (r — n)th (i.e., 77th) iteration, to create a 
fair comparison with the results from our algorithm (where one iteration in¬ 
volves updating the flows on routes corresponding to all 77 columns of A 2 ). 
Thinned trace places for flows on selected routes (corresponding to those in 
Figure 10) are given in Figure 11. The computing time was approximately 
2.5 times slower than that for our algorithm. 

It is evident from these results that Airoldi and Blocker’s (2013) algo¬ 
rithm works far better than the Tebaldi and West (1998) algorithm for the 
initial partition of A. However, the trace plots also indicate that while the 
properties of Airoldi and Blocker’s sampler are improved through a refined 
route ordering, the sampler mixes somewhat less well than our algorithm 
with the optimized partition of A. This result ties in with our examination 
of Example 3 in Section 4. Componentwise sampling fails entirely, or mixes 
extremely slowly, when the partition of A gives rise to a polytope with 
awkward geometry. However, when the partition is updated to maximize 
the slack, then “long moves” are possible in the coordinate directions, and 
sampling in those directions can be preferable to random search directions. 
Further insight is provided by a comparison of the slack for all the sampling 
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algorithms considered. See the supplementary material for details [Hazelton 
(2015)]. 

6. Discussion. The availability of a dependable method for sampling 
route flows conditional on an observed pattern of link counts is pivotal 
to estimation of origin-destination traffic volumes and associated statisti¬ 
cal network tomography problems. As we have shown, implementation of 
Tebaldi and West’s (1998) proposed sampler with a fixed partition of the 
routing matrix is unreliable because the polytope of feasible route flows 
may be oriented at an awkward angle to the sampling directions. Nonethe¬ 
less, the difficulties are resolved by a change of coordinate representation of 
the polytope through a reordering of the columns of A. Indeed, given that 
there is always a good route ordering available (if A is totally unimodular), 
componentwise sampling of the elements of X 2 is adequate. This is fortunate, 
since we speculate that it would be very difficult to develop an exact sampler 
[as opposed to a continuous approximation like that of Airoldi and Blocker 
(2013)] to draw candidate route flows from higher dimensional spaces when 
the traffic is integer valued. 

The unimodularity requirements on A place a caveat on the preceding 
remarks, although not a serious one. In practice, we require only that a 
good route ordering is available for which A\ is unimodular (and hence 
invertible as an integer-valued matrix). This is guaranteed if A is totally 
unimodular, but may well occur even when this is not the case: most of the 
Ai submatrices can be unimodular even if A is not totally unimodular. It 
follows that total unimodularity is a sufficient, but by no means necessary, 
condition for the proposed sampling algorithm to work effectively. 

The previous comments notwithstanding, it is still of interest to explore 
further the issue of total unimodularity. As Airoldi and Blocker (2013) in¬ 
dicate, the routing matrices that are encountered in practice seem to be to¬ 
tally unimodular almost without exception. Nonetheless, one does not have 
to work too hard to find a network tomography problem for which this is not 
the case. Consider the network displayed in Figure 12. Suppose that traffic 
counts are observed on links 1, 2 and 3 only, and that travel is possible for 
O-D pairs (1,3), (2,1), (3,2) and (3,4) via the obvious acyclic routes. 

Based on that route ordering, the link-path incidence matrix is given by 




'1 

0 

1 

o' 

(15) 

A = [Ai \A 2 ] = 

1 

1 

0 

0 



0 

1 

1 

1 
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Fig. 12. A potentially problematic network. If traffic counts are available for only links 
1, 2 and 3, and if travel is possible only between the ordered node pairs (1,3), (2,1), (3,2) 
and (3,4), then the link-path incidence matrix is not totally unimodular. 


The submatrix A is not unimodular [det(Ai) = 2], and thus admits a noninteger¬ 
valued inverse matrix, 


A 


-i 

l 



1 

2 
1 
2 

1 

2 



An immediate consequence is that if we attempt to implement the condi¬ 
tional route sampler using A from (15), then the resultant traffic flows need 
not be integer valued. However, by switching any of the first three columns 
with the fourth, we obtain new partitions A = [A 1 IA 2 ] where the elements 
of Af ] lie in the set {—1,0,1}. Our route flow sampler works successfully 
when only these partitions of A are considered. 

It is interesting to reflect on the characteristics of this problem that led 
to the lack of total unimodularity in (15). The routing scheme is unusual 
(and rather artificial) because travel is only possible over paths comprising 
exactly two links. Since traffic counts are available for links 1, 2 and 3, the 
result is a submatrix A\ in (15) where all the column sums equal two. No 
routing submatrix with this property can be unimodular, a result that can 
be generalized as follows. 


Proposition 4. If A± is unimodular, then its column sums are coprime. 

The proof is given in the Appendix. 

When we reverse the final two columns of A, the column sums of the 
resulting submatrix A\ are coprime and A\ is unimodular. Nonetheless, this 
coprime property is insufficient to ensure unimodularity in general. For ex¬ 
ample, suppose that we connect the networks in Figures 1 and 8 by inserting 
a link joining node 4 of the former to node 3 of the latter. If we leave the 
pattern of permissible O-D pairs and routes unchanged, then we may order 
the routes so that the submatrix A\ for the composite network is block di¬ 
agonal, with the Ai matrices from the original matrices forming the blocks. 
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The combined matrix A\ will have coprime column totals, but will obviously 
not be unimodular. 

This example is somewhat extreme, however. While the composite net¬ 
work is physically connected, it operates as two independent (sub)networks 
since there are no routes requiring travel from one to the other. Whether uni¬ 
modularity of A\ can be guaranteed by imposing some natural assumptions 
on the routes of the network remains unclear. 


APPENDIX 


Proof of Proposition 1. Let y denote the (unconditional) support of 
y. It suffices to show that /y(y|$) = /y(y|$) for all y £ y implies that 
/x(x \0) = /x(x \G) for all x £ X. By the independence assumption on the 
route flows, it is sufficient to show that fx :i (h\G) = fxj{h \G) for all h £ Z>q 
and j £lZ. 

Assume henceforth that jy(y|#) = fy(y\G) for all y £ y . Then using the 
fact that all columns of A contain at least one nonzero element, we obtain 

(16) fx( O|0) = M o|0) = /y(O|0) = fx(o\e), 


where 0 denotes an appropriately dimensioned vector of zeroes. 

The proof now proceeds by induction on route length. To this end, we 
define Sj to be the number of constituent monitored links for route j £ 1Z, 
that is, Sj = 11ay 11 1 = )Ci=i a *j> where a j denotes the jth column of A. Let 
s (i) < s ( 2 ) < • • • < s (p) denote the unique values of Sj in increasing order. 
Now partition the routes into (disjoint) sets Ji, J 2 , • • ■ > Jp , where Ji contains 
all routes comprising s^ links. 

Consider now x = hej for j £ J\, where ej is the jth coordinate vector 
(i.e., a vector of zeros except for a one in the jth position) and h £ Z>i. This 
is a route flow pattern with h vehicles on route j and none on any other 
route. Then 

fx (h\0) 

^>fx(om=f x (he, m 

= fY(ha.j\G) 

= f Y (ha.j\G) 

= fx(hej\~G) 


fxMG) 

fxi(0\G) 


fx(0\G), 


where the penultimate equality holds because x = hej is the unique solution 
of Ax = ha.j. To see this, note that there can be no solution involving shorter 
routes (there are none) or longer routes (since this would be incompatible 
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with a link flow pattern involving just smj links), and there can be no solution 
involving a route j' A j £ J\ because the columns of A are distinct. Applying 
equation (16), it follows that 

(17) fx J (h\o)f Xj m = f Xj (h\~e)f Xj (o\e) 

for h = 1,2,3.... Equation (17) also holds trivially when h = 0. We may 
therefore sum (17) over h £ Z>o to give 

fx j m = f Xj (o\e). 

It follows immediately from (17) that for j £ J\. fxAh\9) = /v. (h\9) for all 
he Z> 0 . 

For the purposes of induction, assume now that fx :i (h\9) = fxj{h\9) for 
all h £ Z>o for all j £ J\ U J 2 U • • • U Jk ■ Consider x = hej for j £ Jk+i ■ 
Starting with h = 1, we have 

f Y (a j \0) = f x (e j \e)+ Y /x(x \0). 

Ax.=a.j 


Now, if x A e j satisfies Ax = a j, then Xj = 0 for all j £ J* for i > k [using 
exactly the same kind of argument as preceded equation (17)]. It follows 
that, for any such x, 


(18) 


fx(x\0) = fx(0\e)ll 
i&J t 


fx t m 

/w( O|0) 


for some indexing set J' C Ji U • • • U Jk■ There is no need to specify this set 
precisely: it is sufficient to know that for j £ jf, fxj(h\9) = fx :j (h\6) for all 
h by our inductive hypothesis. 

Hence, from equation (18), 


fY{* j \0) = fx{e j \O)+ Y 

Ax.= SLj 


= fx(e j \6)+ Y /a(x|0) 
Ax.=a.j 


= fx(ej\0) + f Y (a.j\0) - f x (ej\9). 

Since /y (a ? \0) = /y (a ? \9), it follows that fx{ej\9) = fx(ej\9) and, there¬ 
fore, 


fXj(M9) 

fxj(0\9) 


fx(0\9) 


fx 0 A\9) 

fxMd) 


fx(O\0) 
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and so 

fx t ( 1|«) fx t ( 1|«) 

<19> 7^m = 7^wy 

courtesy of equation (16). 

We continue by applying another “inner” mathematical induction, to 
demonstrate that equation (19) applies for flows h > 1. For the purposes 
of induction, assume that 

f Xj {h*\e) f Xj (h*\~e) 
fx*m fxj(o\ 0 ) 

for all h* = 1, 2 ,..., (h — 1). 

Now, 

(20) f Y (ha j \9) = f x (he j \9) + Y fx(*\0). 

Ax=ha.j 

For every x ^ hej such that Ax = h&j , 


( 21 ) 


h -1 


fx(x\9) = f x (0\e) n 
h*o =0 


fx^Kie) 

fx 3 m 


n 

ieJ t 


fx^m 

f Xi (0\0) ’ 


where {h*} is a set of positive integers no greater than h — /iq, and A C Ji U 
• • • U Jy. (again requiring no explicit specification). Intuitively, this equation 
relies on the fact that the link flow pattern haj can be generated by placing 
/iq vehicles on route j and then splicing together flows on compatible shorter 
routes to account for the remaining h — vehicles. 

The inductive hypotheses imply that equality is maintained in equation 
(21) if 9 is replaced by 9 everywhere on the right-hand side. It follows that 
/x(x|0) = /x(x|0), when from (20) we obtain 


f Y (haj\9) = f x (hej\9)+ Y fx(*\9) 

x^hej 

Ax=ha.j 

= f x (hej\9)+ Y fx(x\9) 

x^hej 

Ax.=ha.j 

= f x (hej\9) + f y {ha 0 \9) - fx(h ej \9) 
= f x (he j\9) + f Y (haj\9) - fx(h ej \9). 
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It follows that fx(hej\0) = f x (hej\Q) and so 

f Xj (h\G) f x .(h\0) 
fXji O|0) /x,(O|0)’ 

completing the inner inductive step. 

We have proved that f Xj ( h\0)f Xj (O|0) = f Xj (h\d)f Xj (O|0) for all h G Z> 0 . 
Summing over h gives f Xj (0\9) = f Xj (0\9) when it follows that f Xj (h\6) = 
f Xj (h\6) for all j G Jk+i- This completes the outer mathematical induction. 

We conclude that f Xj {h\6) = f Xj (h\0) for all j £ TZ and h G Z>o, com¬ 
pleting the proof of Proposition 1. 

Proof of Proposition 3. By Lemma 2.2 of Airoldi and Haas (2011), the 
matrix U is totally unimodular, and therefore all its entries lie in {—1,0,1}. 
Hence, if ui j is the vector formed from the first n elements of u ; , we have 
xi + uij > 0 (interpreted componentwise), and hence x + Uj G X\ y as re¬ 
quired. 

Proof of Proposition 4. Let A\ be unimodular, and let a* be the vector 
of column sums of this matrix. Suppose that the elements of a* are not 
coprime and so have a greatest common divisor of d > 1. Then the elements 
of the vector are integers because A^ 1 is an integer-valued matrix. 

However, 

d _ 1 ajHj ” 1 = dT 1 ! 1 AiA ^ 1 = d - 1 l T , 
providing a contradiction, and hence proving the result. 
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SUPPLEMENTARY MATERIAL 

Supplement to “Network tomography for integer-valued traffic” (DOI: 
10.1214/15-AOAS805SUPP; .pdf). The supplementary materials, stored as 
a zip archive, include data and additional numerical results for the applica¬ 
tions in Section 5. The data comprise link-path incidence matrices, observed 
traffic counts and prior pseudo counts for Bayesian analyses. The additional 
results include effective sample sizes for MCMC output, computing times 
and summaries of the slack for the route flow samplers considered. 
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